Pages
Shape Links
Shape Properties
OCR and OpenAI Processes
OCR and OpenAI Processes
Architecture
Architecture
Document Intelligence - Classification
Document Intelligence - Classification
Manual classification
Manual classification
File Architecture
File Architecture
Process flow
Process flow
Decision Tree
Decision Tree
Phase
Phase
External/Output
External/Output
OCR - Code
OCR - Code
Azure Resource
Azure Resource
Sample - uncluster
Sample - uncluster
Sample - Label
Sample - Label
Schema1
Schema1
Schema2
Schema2
Schema...
Schema...
Schema28
Schema28
Population - uncluster
Population - uncluster
Population - cluster
Population - cluster
Extraction Model
Extraction Model
Unknown
Unknown
Template 2
Template 2
Template 3
Template 3
Template 4
Template 4
Template 1
Template 1
Population - Cluster
Population - Cluster
Schema1
Schema1
Schema2
Schema2
Schema...
Schema...
Schema43
Schema43
Sample - Label
Sample - Label
Schema1
Schema1
Schema2
Schema2
Schema...
Schema...
Schema43
Schema43
Population - Cluster
Population - Cluster
Schema1
Schema1
Schema2
Schema2
Schema...
Schema...
Schema43
Schema43
Population - uncluster
Population -
uncluster
template1
template1
template2
template2
template99
template99
template1
template1
template2
template2
template99
template99
trainingsamples
trainingsamples
population
population
pdf
pdf
websiteinfo
websiteinfo
json
json
jsondata
jsondata
jsonmodified
jsonmodified
json
json
pdf
pdf
jsondata
jsondata
jsonresponses
jsonresponses
Storage Account
Storage Account
OpenAI
OpenAI
Document Intelligence Script
Document
Intelligence Script
Delta Migration
Delta Migration
SQL Warehouse
SQL Warehouse
QA/QC
QA/QC
Prompts
Prompts
Same as PDFs
Same as PDFs
Each document has a unique document ID from Project ID extrac...
Each document has a unique
document ID from Project ID
extracted by OpenAI
Same names as PDFs
Same names as PDFs
Unique document names
Unique document names
Population - cluster
Population - cluster
Unknown
Unknown
Template 2
Template 2
Template 3
Template 3
Template 4
Template 4
Template 1
Template 1
Extraction Model
Extraction Model
websiteinfo
websiteinfo
OpenAI Script
OpenAI Script
OpenAI – Code
OpenAI – Code
review
review
jsondata
jsondata
jsonmodified (Clone)
jsonmodified
(Clone)
1:1
1:1
Reads/writes
Reads/writes
overwritten
overwritten
Listener 1:1
Listener
1:1
Business Intelligence Tools
Business
Intelligence
Tools
Frequent access
Frequent
access
Raw PDFs
Raw PDFs
Document Intelligence
Document
Intelligence
creates
creates
access
access
access
access
Step 2: Train Classification Model
Step 2: Train Classification
Model
Step 1: Identify minimum of 10 samples for each schema
Step 1: Identify minimum of
10 samples for each schema
Step 2b: Upload
Step 2b: Upload
Within each model, select required fields
Within each
model, select
required fields
Step 4: Run Extraction Model
Step 4: Run Extraction
Model
Step 3: Run Classification Model
Step 3: Run Classification
Model
Step 1: Manually Figure out Schemas/Formats
Step 1: Manually Figure out
Schemas/Formats
Step 2: Train Classification Model
Step 2: Train Classification
Model
Step 2b: Upload
Step 2b: Upload
Step 3: Run Classification Model
Step 3: Run Classification
Model
OCR
OCR
OpenAI
OpenAI
Document type
Document type
Azure Blob Storage Account Container
Azure Blob
Storage
Account
Container
Databricks
Databricks
SQL Warehouse
SQL Warehouse
Raw PDFs
Raw PDFs
jsondata
jsondata
Event Grid
Event Grid
Trigger on new PDFs uploaded
Trigger on new PDFs uploaded
creates
creates
jsonresponses
jsonresponses
creates
creates
WebApps
WebApps
jsonmodified
jsonmodified
references
references
updates
updates
Manual clone
Manual clone
OpenAI (SSC)
OpenAI (SSC)
...
...
...
...
...
...
...
...
...
...
Within each model, select required fields
Within each
model, select
required fields
Step 4: Run Extraction Model
Step 4: Run Extraction
Model
Want to perform analysis on data.
Want to perform analysis
on data.
Is the document digitized?
Is the document
digitized?
What is the document format?
What is the
document format?
Not a supported format. Research phase to convert videos into...
Not a supported format.
Research phase to convert
videos into frames
Video
Video
Format is supported; still research phase
Format is supported; still
research phase
Image
Image
Is OpenAI integration required?
Is OpenAI
integration
required?
Text
Text
Access Data in SQL Warehouse
Access Data in SQL
Warehouse
No
No
Is source data Protected-B or above?
Is source data
Protected
-B or
above?
Yes
Yes
OpenAI not cleared by IT-SEC to use sensitive data
OpenAI not cleared by IT-
SEC to use sensitive data
Yes
Yes
Process large amount of documents with OpenAI?
Process large amount of
documents with OpenAI?
No
No
OpenAI web-chatbot Uses web interface to ask individual quest...
OpenAI web-chatbot Uses
web interface to ask
individual questions for a
given document.
No
No
Are prompts pre-determined?
Are prompts pre-
determined?
OpenAI-Script can group large amounts of documents for a give...
OpenAI-Script can group
large amounts of
documents for a given set
of prompts and process all
documents at once.
OpenAI WebApp uses web interface to upload a document with a ...
OpenAI WebApp uses
web interface to upload a
document with a set of
prompts and get OpenAI
responces
FoSx-SP-Waayback-LethalIndigowingedparrot
FoSx-SP-Waayback-
LethalIndigowingedparro
t
PSSI-OpenAI-RG
PSSI-OpenAI-RG
Storage Account
Storage Account
(pssidatalake)
(pssidatalake)
Web App
Web App
(pssi-openAI-prompts)
(pssi-openAI-prompts)
Document Intelligence
Document
Intelligence
(pssi-prebult-models)
(pssi-prebult-
models)
Data Factory
Data Factory
(pssi-pipelines)
(pssi-pipelines)
Databricks
Databricks
(pssi-openai-databricks)
(pssi-openai-
databricks)
Web App
Web App
(pssi-prd-pstb-rcoe-gc)
(pssi-prd-pstb-rcoe-gc)
Web App
Web App
(pssi-prd-emb-ispe-planningliterature)
(pssi-prd-emb-ispe-
planningliterature)
SSC Directory
SSC Directory
DFO Directory
DFO Directory
SQL Warehouse
SQL Warehouse
(emb-ipse)
(emb-ipse)
SQL Warehouse
SQL Warehouse
(pstb-rcoe)
(pstb-rcoe)
Azure OpenAI
Azure
OpenAI
(pssi-openai)
(pssi-openai)
SC2G - PROD ProB
SC2G - PROD ProB
EDH-PSSI-PROD-RG
EDH-PSSI-PROD-
RG
Storage Account
Storage Account
(stpssiprd)
(stpssiprd)
Document Intelligence
Document
Intelligence
(pssi-doc-ai-prd)
(pssi-doc-ai-prd)
Data Factory
Data Factory
(adfpsssiprdinnovation)
(adfpsssiprdinnovatio
n)
Web App
Web App
(pssi-openAI-chatbot)
(pssi-openAI-
chatbot)
EDH-PROD-RG
EDH-PROD-RG
Blob Container
Blob Container
(fm-rec-fishslips)
(fm-rec-fishslips)
Blob Container
Blob Container
(science-stockassesment-sil)
(science-stockassesment-sil)
Blob Container
Blob Container
(rm-dml-licenses-logbooks)
(rm-dml-licenses-logbooks)
SQL Warehouse
SQL Warehouse
(science-stockassesment)
(science-stockassesment)
SQL Warehouse
SQL Warehouse
(emb-ffhpp)
(emb-ffhpp)
SQL Warehouse
SQL Warehouse
(rm-dml)
(rm-dml)
SQL Warehouse
SQL Warehouse
(fm-rec)
(fm-rec)
Note: Document intelligence currently does not have the newes...
Note:
Document intelligence currently does not have the newest API avalible for region
"Central Canada" which outputs a confidence
score for extracted table data.
Note: Document intelligence currently does not have the newes...
Note
: Document intelligence currently does not have the newest API avalible for region "Central Canada" which
outputs a confidence score for extracted table data.
Note
: OpenAI cannot be deployed in the DFO directory hence all of the OpenAI related instances in SSC
Missing: Access to Databricks in EDH-PROD-RG with permissions...
Missing
: Access to Databricks in EDH
-
PROD
-
RG with
permissions to interact with Document Intelligence,
Data Factory, and Storage Account from EDH
-
PSSI
-
PROD
-
RG
Missing
: Instances of SQL Warehouses for each project
having data processed in EDH
-
PSSI
-
PROD
SQL Warehouse
SQL Warehouse
(qcfm-rec)
(qcfm-rec)
Databricks
Databricks
Web App
Web App
(pssi-prd-ocr-qcfm-rec)
(pssi-prd-ocr-qcfm-
rec)
Web App
Web App
(pssi-prd-ocr-emb-ffhpp)
(pssi-prd-ocr-emb-
ffhpp)
Web App
Web App
(pssi-prd-ocr-fm-rec)
(pssi-prd-ocr-fm-rec)
Web App
Web App
(pssi-prd-ocr-science-stockassesment)
(pssi-prd-ocr-
science
-
stockassesment)
Blob Container
Blob Container
(emb-ffhpp-complicance-inspectionreports)
(emb-ffhpp-complicance-
inspectionreports)
Blob Container
Blob Container
(qcfm-rec-seaobserver-logbooks-dockside-purchaseslips)
(qcfm-rec-seaobserver-
logbooks
-dockside-
purchaseslips)
Blob Container
Blob Container
(emb-ipse-planningliterature)
(emb-ipse-
planningliterature)
Blob Container
Blob Container
(pstb-rcoe-gc)
(pstb-rcoe-gc)
creates
creates
access
access
jsondata
jsondata
creates
creates
references
references
Document Intelligence Classification & Extraction Process (Cu...
Document Intelligence
Classification & Extraction
Process (Custom)
QA/QC Website interface
QA/QC Website interface
Digitized
Digitized
Document Intelligence Extraction (Pre-build)
Document Intelligence
Extraction (Pre
-build)
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Legend: - In house Application - SAS
Legend:
-
In house Application
-
SAS
Web App
Web App
pssi-openai-analyzer-prd-01
pssi-openai-analyzer-
prd
-01
Typed
Typed
Clear Formatting and Consistent Layout
Clear Formatting and
Consistent Layout
Select all fields
Select all fields
Web Page
Web Page
Field: verified (yes/no)
Field: verified
(yes/no)
Databricks – SQL Warehouse
Databricks – SQL Warehouse
Field: verified (yes/no)
Field: verified
(yes/no)
Table 2
Table 2
Table1
Table1
Table 3
Table 3
No
No
No
No
Yes
Yes
No
No
Yes
Yes
Yes
Yes
No
No